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Abstract 

Over the past several years, DNA sequencing has 
emerged as one of the driving forces in life-sciences, 
paving the way for affordable and accurate whole ge- 
nome sequencing. As genomes represent the entirety 
of an organism's hereditary information, the availabil- 
ity of complete human genomes prompts a wide range 
of revolutionary applications. The hope for improving 
modern healthcare and better understanding the human 
genome propels many interesting and challenging re- 
search frontiers. Unfortunately, however, the prolifer- 
ation of human genomes amplifies worrisome privacy 
concerns, since a genome represents a treasure trove of 
highly personal and sensitive information. In this ar- 
ticle, we provide an overview of positive results and 
biomedical advances in the field, and discuss privacy 
issues associated with human genomic information. Fi- 
nally, we survey available privacy-enhancing technolo- 
gies and list a number of open research challenges. 

1 Introduction 

Over the past half a century, DNA sequencing has 
been one of the most active and fast-paced areas of re- 
search in life-sciences, yielding complete sequencing 
of many eukaryotic organisms, including men [1, 2]. A 
key, revolutionary role in this context has been played 
by High-Throughput Sequencing (HTS) techniques. 
In 2007, scientists sequenced the first diploid human 
genome in 2007 [3] and recently completed a project to 
sequence 1,000 human genomes [4]. 

The $3B, 13-year Human Genome Project [2] has 
involved a number of research institutions worldwide 
and is considered one of the major breakthroughs of 
this century. Nowadays, different HTS technologies 



are competing to accurately sequence an individual hu- 
man genome, composed of about 3 billion DNA nu- 
cleotides, with prices affordable for a large number of 
individuals. 

The race for cheaper and more accurate whole ge- 
nome sequencing technologies has been quite excit- 
ing, plunging costs from $1B only a decade ago to 
$250,000 in 2008 (by Illumina), and to about $4,400 
a couple of years ago (in 2009 by Complete Ge- 
nomics [5] and in 2011, again, by Illumina [6]). Life 
Technologies announced this year that they can scan 
the full genome for $1,000 [7], while Oxford Nanopore 
announced their intent to commercialize a sequencer 
the size of a USB memory stick that can sequence a 
whole genome for $900 in 15 minutes [8]. Geniachip 
claims to go beyond this and deliver the same results 
for just $100. Large corporations, such as IBM and GE 
have also entered the race in the last couple of years, 
while, recently, Roche has unsuccessfully attempted to 
acquire Illumina. 

Nowadays, the landscape of companies and tech- 
nologies competing in this sector is so fast-paced that it 
becomes relatively hard to keep a comprehensive, up- 
to-date list. Nonetheless, it is evident that whole ge- 
nome sequencing will be a reality in the near future, a 
commodity costing less than an X-ray or an MRI scan. 

In this article, we discuss how advances in whole 
genome sequencing have created contrasting feelings 
pertaining the implications of widespread availability 
of whole genomes. On the one hand, the hope for 
improving modern healthcare and better understanding 
the human genome has attracted significant research in- 
vestments and, arguably, generated many groundbreak- 
ing results. On the other hand, however, a number of 
alarming privacy and ethical concerns has been raised 
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pertaining to the sensitivity of human genomic infor- 
mation and its disclosure. 

We provide an overview of positive results and recent 
biomedical advances in the field (Sec. 2), and discuss 
privacy issues associated with human genomic infor- 
mation (Sec. 3). Finally, we provide a list of a few com- 
pelling research challenges in the area (Sec. 4) and sur- 
vey the state of the art in privacy-enhancing technolo- 
gies focusing on computational genomic tests (Sec. 5). 

2 The good news: Beyond Personal- 
ized Medicine 

Undoubtedly, ubiquity of human genomes creates 
enormous opportunities and challenges. In particular, 
it promises to launch a new era of genome-enabled 
predictive, preventive, participatory, and personalized 
medicine ("P4 medicine") [9] . 

Personalized Medicine is recognized as a signifi- 
cant paradigm shift and a major trend in health care, 
moving us closer to a more precise, powerful, and holis- 
tic type of medicine [10]. With personalized medi- 
cine, treatment and medication type/dosage would be 
tailored to the precise genetic makeup of individual pa- 
tients. Experts predict that advances in whole genome 
sequencing will further stimulate development of per- 
sonalized medicine [11]. Commercial companies like 
Knome already offer services that take raw genome 
data and create usable reports for doctors. In general, 
the availability of a patient's fully sequenced genome 
will enable clinicians, doctors, and testing facilities to 
run a number of complex, correlated genetic tests in a 
matter of seconds, using specialized computational al- 
gorithms (as opposed to more expensive and slower in- 
vitro tests). 

Already today, personalized medicine is a reality 
in a number of medical scenarios. Measurements of 
erbBl protein in breast, lung, or colorectal cancer pa- 
tients are taken before selecting proper treatment. It has 
been shown that the trastuzumab monoclonal antibody 
is effective only in patients whose genetic receptor is 
over-expressed [12]. Also, testing for the thiopurine S- 
methyltransferase (tpmt) gene is required prior to pre- 
scribing for 6-mercaptopurine and azathioprine - two 
drugs used for treating childhood leukemia and autoim- 
mune diseases. The tpmt gene codes for the TPMT en- 
zyme that metabolizes thiopurine drugs: genetic poly- 
morphisms affecting enzymatic activity are correlated 



with variations in sensitivity and toxicity response to 
such drugs. Patients suffering from this genetic dis- 
ease (1 in 300) only need 6-10% of the standard dose 
of thiopurine drugs; if treated with the full dose, they 
risk severe bone marrow suppression and subsequent 
death [13]. Similarly, doctors who want to prescribe 
Zelboraf (Roche's treatment for advanced skin cancer) 
first test the patient for the BRAFV 600E mutation, 
which is found in about half of all cases. Other anal- 
ogous examples include the Philadelphia chromosome 
mutations related to Acute Lymphoblastic Leukemia 
(ALL) and BRCA1/BRCA2 genes in correlation to fa- 
milial breast and ovarian cancer syndromes. 

Experts estimate that about a third of the 900 can- 
cer drugs currently in clinical trials could soon come 
to market with a DNA or other molecular test at- 
tached [14]. Although most predominant, cancer treat- 
ment is only one of the application fields of personal- 
ized medicine. For instance, a recent Canadian study 
has shown how, for some cardiac patients, recovery 
from a common heart procedure can be complicated 
by a single gene responsible for drug processing, and 
that selection of blood thinner drugs should depend on 
whether or not patient holds such a gene mutation [15]. 
Also, in a study that shows how whole-genome se- 
quencing could be used in life-or-death medical situa- 
tions involving newborns, researchers at a hospital in 
Kansas analyzed the entire genomes of seven babies 
that died near birth, accurately diagnosing five of them 
with critical conditions within about 50 hours each - 
fast enough to be meaningful to their care [16]. 

Tremendous advances in Pliarmacogenomics (the 
study of the impact of genetic variation on the re- 
sponse to medications) are also driving research in the 
field. Examples include genes involved in the action 
and metabolism of warfarin (Coumadin), a medication 
used as an anticoagulant [17], as well as genes encod- 
ing Cytochrome P450 enzymes, which metabolize neu- 
roleptic medications to improve drug response and re- 
duce side-effects [18]. 

The availability of whole human genomes will also 
facilitate a number of genetic tests that today are per- 
formed in vitro, such as, paternity/ancestry testing and 
genetic compatibility, by reducing costs and time. 

For instance, genetic paternity tests may be run 
very efficiently in computation, by designing algo- 
rithms that emulate in-vitro, highly accurate, court- 
admissible tests, e.g., based on Short Tandem Re- 
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peats (STRs) and Restriction Fragment Length Poly- 
morphisms (RFLPs).' Actually, running those algo- 
rithms on whole genomes could even improve the accu- 
racy of paternity tests: experts point out that, while any 
two unrelated humans share about 99.5% of their ge- 
nomes, individuals tied by a parent-child relationship 
have 99.8% of their genome in common. Thus, one 
could realize error-free paternity tests by counting the 
number of matching nucleotides across test takers. 

On a similar note, ancestry and genealogical test- 
ing allows individuals to trace their lineage by analyz- 
ing their genomic information (the scope of such tests 
being often quite heterogeneous.) Ancestry testing is 
useful in a myriad of health-related applications (e.g., 
susceptibility to diseases common to certain popula- 
tions) but it is also increasingly used in social or recre- 
ational scenarios, e.g., to map one's own genetic her- 
itage or find common ancestry. Several commercial 
entities (e.g., 23andMe [19]) already maintain a col- 
lection of sample genomes from individuals belonging 
to different ethnic groups, and compare them against 
their customers' genomic information to understand 
how they relate to known ethnic groups. 

Genetic compatibility tests let potential or exist- 
ing partners assess the possibility of transmitting to 
their children a genetic disease with Mendelian inheri- 
tance [20]. Modern genetic testing can accurately pre- 
dict whether a couple is at risk of conceiving a child 
with an autosomal recessive disease. Consider, for in- 
stance, Beta-Thalassemia minor, that causes red cells 
to be smaller than average, due to a mutation in the hbb 
gene. It is called minor when the mutation occurs only 
in one allele. This minor form has no severe impact on 
a subject's quality of life. However, the major variant — 
that occurs when both alleles carry the mutation — is 
likely to result in premature death, usually, before age 
twenty. Therefore, if both partners silently carry the mi- 
nor form, there is a 25% chance that their child could 
carry the major variety. 

In general, genetic tests are routinely used for sev- 
eral purposes, such as newborn screening, confirma- 
tional diagnostics, as well as pre-symptomatic test- 
ing, e.g., predicting Huntington's disease [21] and es- 
timating risks of various congenital diseases. In fact, 
23andMe [19] provides relatively low-cost genetic tests 
for 960,000 specific Single-Nucleotide Polymorphisms 

'E.g., in RFLP-based paternity test, individuals' genomes are 
probed and cut by enzyme digestion and tire test outcome is as- 
sessed based on the similarity of resulting fragments. 



(SNPs). (SNPs are the most common form of DNA 
variation occurring when a single nucleotide differs be- 
tween members of the same species or paired chromo- 
somes of an individual [22, 23].) However, while some 
diseases (e.g., Huntington's) are caused by mutations in 
a single gene and are easily tested in vitro, the risk of 
developing other diseases depends on multiple genes, 
which makes them difficult to identify. Low-cost ge- 
netic sequencing provides researchers with much more 
genomic information, and enables them to identify new 
genetic variations as well as run more complicated 
tests. 

While the relationship between advances in whole 
genome sequencing and breakthroughs in personalized 
medicine and genetic tests creates a lot of research "en- 
thusiasm", a number of biomedical experts have also 
expressed doubts related to the limits of gene map- 
ping's power to predict a person's likelihood of devel- 
oping a disease [24]. It remains unclear how the avail- 
ability of large numbers of whole genomes will yield a 
better understanding of the human genome (and corre- 
lated diseases), e.g., through Genome- Wide Associa- 
tion Studies (GWAS). These studies examine common 
genetic variants in a very large set of individuals to find 
out if any variant is associated with, e.g., a disease, and 
possibly correlate the disease to a given ancestry line. 
Additional areas to explore include genetic compatibil- 
ity tests for sperm and organ donors [25], evolution- 
ary studies (e.g., based on genomes of Denisovans and 
Neanderthals [26]), as well as research on genomes of 
crops and animals [27] . 

3 The bad news: Privacy, Legal, and 
Ethical Concerns 

Widespread and low-cost availability of HTS tech- 
nologies and genomic data has raised a number of eth- 
ical, security, and privacy concerns [28]. The human 
genome not only uniquely and irrevocably identifies its 
owner, but also contains information about ethnic her- 
itage, predisposition to numerous diseases and condi- 
tions, including mental disorders, and many other phe- 
notypic traits [29, 30, 31]. Recent studies suggest that 
even political preferences may be influenced by voters' 
genetic makeup [32]. Furthermore, due to its hereditary 
nature, disclosing one's human genome also implies, to 
a certain extent, disclosing the genomes of close rela- 
tives. 
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Traditional approaches to privacy, such as de- 
identification or aggregation [33], become completely 
moot in the genomic era, since the genome itself is 
the ultimate identifier. To further compound the pri- 
vacy problem, health information is increasingly shared 
electronically among insurance companies, health care 
providers and employers. This, coupled with the possi- 
bility of creating large centralized genome repositories 
(e.g., for GWAS research), raises the specter of possi- 
ble abuses. (A few results exploring de-anonymizing 
individuals from genomic datasets include [34, 35].) 

Long before whole genome sequencing prices drop 
to a few hundred dollars, society had already envi- 
sioned a future where the issue of genetic discrimina- 
tion could dramatically affect social dynamics, hiring 
and healthcare practices, and even ways of procreating. 
Even popular culture, with sci-fi movies and narrative 
literature, has expressed its concerns - for instance, the 
concept of genism actually originated from the 1997 
Hollywood movie "Gattaca" [36], denoting the theory 
that distinctive human characteristics and abilities are 
determined by genes, based on DNA sequence charac- 
teristics with resulting in discrimination as pernicious 
as racism [37]. (One could note how genism actually 
shares several traits with eugenic ideals prominent in 
the hateful policies of some regimes, e.g., the Third Re- 
ich.) The movie led molecular biologist Lee M. Silver 
to write in Nature Genetics that "Gattaca is a film that 
all geneticists should see if for no other reason than to 
understand the perception of our trade held by so many 
of the public-at-large" [38]. 

Several funding agencies, e.g., the US National 
Human Genome Research Institute (NHGRI), has 
established — from the very beginning of the Human 
Genome Project, in 1990 — efforts like the Ethical, Le- 
gal and Social Implications (ELSI) Research Pro- 
gram, to foster basic and applied research on the eth- 
ical, legal and social implications of genetic and ge- 
nomic research for individuals, families and commu- 
nities. Some federal laws have been passed to start 
addressing privacy issues. The 2003 Health Insurance 
Portability and Accountability Act (HIPAA) provides a 
general framework for protecting and sharing Protected 
Health Information (PHI), and, in 2008, the Genetic In- 
formation Nondiscrimination Act (GINA) was adopted 
to prohibit discrimination on the basis of genetic infor- 
mation with respect to health insurance and employ- 
ment [39]. 



While providing general guidelines and a basic 
safety net, current legislation does not offer de- 
tailed technical information about safe and privacy- 
preserving ways for storing and querying genomes. Pri- 
vacy practitioners are strongly advocating the need for 
more restrictive legislation as a result of gaps in cur- 
rent poUcies - see, e.g., a comprehensive list of EPIC's 
efforts at http://epic.org/privacy/genetic/. Also, a very 
recent report from the Presidential Commission for the 
Study of Bioethical Issues [40] has analyzed advances 
of whole genome sequencing, and highlighted growing 
concerns about privacy and security. The report lists 12 
privacy and security recommendations, including de- 
identification. (On a separate note, while the report 
is related to our article, observe that its scope is quite 
different from ours: we aim at a technical analysis of 
technologies and challenges, whereas, [40] provides a 
high-level "effort to identify and promote policies and 
practices that ensure scientific research, health care de- 
livery, and technological innovation are conducted in a 
socially and ethically responsible manner".) 

At the policy level, the main open challenges include, 
for instance, the need for informed consent to guard 
against surreptitious DNA testing by requiring author- 
ities and companies to obtain written permission from 
citizens before collecting, analyzing, storing or shar- 
ing their genetic information (e.g., preventing people 
from collecting hair or saliva samples and maliciously 
sequencing the victims' genome). 

On the other hand, some academic researchers fear 
that privacy-restrictive measures could seriously hinder 
genomic research. Scientists typically sequence DNA 
from thousands of people to discover genes associated 
with particular diseases, thus, the informed consent re- 
striction would mean that large genomic datasets could 
not be re-used to study a different disease - researchers 
would either need to destroy the data after each study, 
or track down thousands of former subjects for new au- 
thorizations [41]. 

4 Research Challenges 

While privacy issues are not yet hampering progress 
in basic genomic research, it is not too early to inves- 
tigate them, as discussed above, in light of their com- 
plexity and potential impact on society. 

In order for computational genetic tests on whole hu- 
man genomes to become accepted and commonplace. 
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efficient and (possibly) privacy-preserving versions of 
such tests need to be developed. This poses a number 
of challenges, which we investigate below. 

Accessibility: As we discussed in Section 1, it is rea- 
sonable to assume that, in a few years, a relevant num- 
ber of individuals worldwide will have access to their 
fully- sequenced genome. Due to its sensitivity and size 
(about 3 billion letters), one of the most difficult re- 
search challenges is related to how and where should 
the genomes be stored. Should it be given to the in- 
dividual and stored on her PC? On a USB stick? On 
dedicated hardware? On her smartphone? Or should 
the genome be trusted with another entity? A physi- 
cian? A healthcare provider? The health insurance 
provider? A trusted third-party cloud? Naturally, an- 
swering these questions requires a clear understanding 
of information technology as well as legal, ethical, pri- 
vacy, and ethnographic issues (which are closely con- 
nected to challenges discussed below). 

Privacy: Given its extreme sensitivity, an individual 
should, ideally, never disclose personal genomic infor- 
mation. However, one should be able to allow others 
(e.g., individuals, doctors, clinicians) to run specific ge- 
netic tests that yield nothing beyond their intended re- 
sults. For instance, letting a testing facility run some 
genetic tests should rather not result in surrendering 
one's whole genome. 

In this context, additional motivations (besides ethi- 
cal and legal ones) for privacy protection stem from lia- 
bility concerns. Mere possession of a patient's sensitive 
information would require the testing entity to demon- 
strate that the information was treated appropriately and 
disposed of when no longer needed. Considering sev- 
eral recent (and rather frequent) incidents of massive 
losses of sensitive records, the entity might be unwill- 
ing to assume additional risk. 

Long-term data safety: The human genome uniquely 
identifies its owner, but also discloses a lot of infor- 
mation about its relatives as well as its descendants, 
even several generations into the future. This prompts 
the problem of long-term data safety, even if human 
genomes are always stored encrypted. An encryp- 
tion scheme considered strong today might gradually 
weaken in the long term. Consequently, it is not too far- 
fetched to imagine that a third-party in possession of an 
encrypted genome might be able to decrypt it, e.g., 20 
or 50 years later. Whereas, genome sensitivity does not 



dissipate over time. 

Accuracy and Accountability: Computational ge- 
nomic tests should guarantee accuracy and reliability 
comparable to current (and widely accepted) lab-based 
in-vitro equivalents. For example, a software imple- 
mentation of the paternity test on fully sequenced ge- 
nomes should offer at least the same confidence as its 
in-vitro counterpart, currently admissible in a court of 
law. Also, computational tests should aim at account- 
ability, e.g., by providing guarantees that tests are run 
correctly and on intended genomic information. 

Efficiency: Computational genomic tests should incur 
minimal storage, communication, and computational 
costs. Arguably, minimality in this setting is relative 
to the context of such tests. For instance, patients may 
be inclined (and used) to wait several days to obtain re- 
sults of genetic tests that concern their health, however, 
in the computational setting, long waiting times might 
hinder the real-world practicality of these tests (besides 
taking out one of the main motivations for computa- 
tional tests.) Also, if a patient's genome is stored on 
her PC or phone, usability of these tests will be mini- 
mized, e.g., due to connectivity and battery Ufe issues. 

Usability: Computational genomic tests that involve 
end-users should be usable by, and meaningful to, reg- 
ular non-tech-savvy individuals. This translates into 
non-trivial questions, such as: how much understand- 
ing should be expected from a user running a test? 
What information (and at what level of granularity) 
should be presented to the user as part of a test and 
as its outcome? Do privacy perceptions and concerns 
experienced by patients correspond to what the scien- 
tific community would expect? For instance, one may 
think that patients will be likely to trade off privacy of 
their genome to enable tests that can save them from, 
e.g., cancer. However, to the best of our knowledge, 
no scientific study has investigated users' concerns and 
(mis)proven common beliefs to this regard, thus, point- 
ing out the need for ethnographic studies in the field. 
Also, it remains an open problem to explore effective 
ways to communicate to the users the potential privacy 
risks associated with genomic information and its dis- 
closure. 

Large-scale research on human genomes: Finally, 
as discussed in Section 3, potential privacy, legal, and 
ethnical concerns appear contrasting to enabling large- 
scale research on human genomes, such as, Genome- 
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Wide Association Studies (GWAS). One of the neces- 
sary conditions for effective GWAS is actually the large 
availability of human genomes, e.g., in order to dis- 
cover correlations between genetic makeups and med- 
ical conditions. Consequently, a current research trend 
is to store donors' genomes on clouds and clusters of 
computers and employ "big data " mining and search- 
ing technologies (such as, MapReduce) for genomic re- 
search initiatives, like GWAS, sequencing alignment, 
etc. [42, 43]. Once again, related privacy and legal con- 
cerns (also related to relatives and descendants) remain 
a challenging open problem. 

5 Available Techniques for Privacy- 
Preserving Tests on Genomic Data 

Motivated by the extreme sensitivity of genomic in- 
formation, the security research community has been 
attuned to the emergence of whole genome sequencing 
and a few privacy-preserving cryptographic techniques 
have been proposed in recent literature. Alas, the ma- 
jority of them focus on (and only apply to) secure com- 
putation on DNA fragments, and not to whole genomes. 
However, a couple of recent results have begun investi- 
gating privacy-respecting tests on whole genomes. 

5.1 Secure Computation on DNA Fragments 

Results on privacy-preserving operations on DNA 
fragments are mainly of two kinds: (1) secure search- 
ing/matching DNA strings, (2) computing the similarity 
of DNA sequences. We review them in the following. 

Troncoso-Pastoriza, et al. [44] proposed an error- 
resilient privacy-preserving protocol for string 
searches. One party, on input of a DNA snippet, can 
verify the existence of a short template (e.g., a genetic 
test held by the service provider) within its (short) 
snippet. This technique handles errors and maintains 
privacy of both the template and the snippet. Each 
query is represented as an automaton executed using 
a finite state machine (FSM) in an oblivious manner. 
Also, secure pattern matching techniques, e.g., those 
in [45] and [46], could also be applied to securely 
search binary strings in a DNA snippet. Then, Katz, 
et al. [47] realized secure computation of the CODIS 
test [48] (run by the FBI for DNA identity testing) 
and other search tests that could not be otherwise 
implemented using pattern matching or FSM. Alas, 



the communication and computational complexities 
introduced by cryptographic operations in these tech- 
niques are not practical for real-world deployment 
(even worse if one considers applying these techniques 
to whole genomes). 

Another set of cryptographic results focus on pri- 
vately computing the edit distance for two strings, or 
DNA snippets, a, /3. Recall that edit distance is de- 
fined as the minimum number of operations, such as, 
delete, insert, or replace, needed to transform a into 
/3. Privacy-preserving computation of Smith-Waterman 
scores [49] has also been investigated and used for se- 
quence alignment. Jha, et al. [50] (and follow-up work) 
show how to securely compute edit distance using gar- 
bled circuits [51], but demonstrate that the resulting 
overhead is acceptable only for small strings. 

Wang, et al. [52] developed techniques for compu- 
tation on genomic data stored at a data provider, in- 
cluding: edit distance. Smith- Waterman and search for 
homologous genes. Program specialization is used to 
partition genomic data into "public" (most of the ge- 
nome) and "sensitive" (a very small subset of the ge- 
nome). Sensitive regions are replaced with symbols 
by data providers before data consumers have access 
to genomic information. However, due to today's lim- 
ited understanding of human genomes, such partition 
in sensitive and non-sensitive information is likely to 
be completely ineffective in a few years. 

Finally, Franz et al. [53] show how privacy of ge- 
nomic sequences can be protected while they are ana- 
lyzed using Hidden Markov Models (HMM). HMM is 
commonly used in bioinformatics to detect certain non- 
beneficial patterns in the genome and, unsurprisingly, 
allows more powerful computations than string match- 
ing. 

5.2 Secure Testing on Whole Human Ge- 
nomes 

Baldi, et al. [54] recently introduced several cryp- 
tographic protocols for privacy-preserving testing of 
whole human genomes, including paternity tests and 
genetic screening for personalized medicine or reces- 
sive genetic diseases. In their setting, individuals ob- 
tain their genomes and allow authorized parties (e.g., 
doctors and clinicians) to run genetic tests such that 
only test results are disclosed to one or both par- 
ties (with provable security). However, [54] only 
addresses the issue of designing cryptographic pro- 
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tocols, and does not deal with issues like where to 
store whole sequences. To this end, their follow-up 
work [55] starts tackling such challenges, and pro- 
poses a framework and an implemented toolkit, called 
GenoDroid. It incorporates several techniques offering 
efficient privacy-preserving genomic testing and runs 
on commodity Android smartphones - it is available 
at http://sprout.ics.uci.edu/projects/privacy-dna. Also, 
preliminary user studies seem to support usability and 
acceptability of proposed techniques. 

Recent work in [56, 57] also explore privacy- 
enhancing technologies for medical tests on genomic 
data and build a secure architecture for privacy- 
preserving disease-susceptibility tests. 

Finally, Chen, et al. [42] studied the problem of 
privacy-preserving mapping and aligning of human ge- 
nomic sequences to a reference genome, by outsourc- 
ing work to a hybrid cloud. In fact, at sequencing time, 
human genomes are read in short sequences, and these 
need to be aligned by comparing them to a reference 
genome. The work in [42] enables one to perform this 
task by outsourcing the computation to cloud and pro- 
tecting DNA information marked as sensitive. 

6 Conclusions 

This article presented an analysis of recent progress 
in whole genome sequencing. We first provided 
an overview of new technologies, applications, and 
biomedical advances, stimulated by the promise of 
widespread availability of complete human genomes. 
In particular, the hope for personalized medicine, i.e., 
tailoring diagnosis and treatment to patients' genetic 
makeup, has prompted a number of pioneering results. 
Then, we investigated privacy issues associated with 
human genomic information, as human genomes rep- 
resent a treasure trove of highly personal and sensitive 
information. Finally, we surveyed the state of the art in 
privacy-enhancing technologies focusing on computa- 
tional genomic tests and provided a compelling list of 
several research challenges that call for extensive work 
in this area. 
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